Overview

Dataset statistics

Number of variables13
Number of observations1586614
Missing cells68148
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory157.4 MiB
Average record size in memory104.0 B

Variable types

NUM9
CAT4

Warnings

brewery_name has a high cardinality: 5742 distinct values High cardinality
review_profilename has a high cardinality: 33387 distinct values High cardinality
beer_style has a high cardinality: 104 distinct values High cardinality
beer_name has a high cardinality: 56857 distinct values High cardinality
beer_abv has 67785 (4.3%) missing values Missing

Reproduction

Analysis started2020-10-30 05:38:17.627596
Analysis finished2020-10-30 05:39:36.972371
Duration1 minute and 19.34 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

brewery_id
Real number (ℝ≥0)

Distinct5840
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3130.099202
Minimum1
Maximum28003
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:37.114957image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile30
Q1143
median429
Q32372
95-th percentile16866
Maximum28003
Range28002
Interquartile range (IQR)2229

Descriptive statistics

Standard deviation5578.103987
Coefficient of variation (CV)1.782085368
Kurtosis3.408354127
Mean3130.099202
Median Absolute Deviation (MAD)366
Skewness2.083747568
Sum4966259215
Variance31115244.1
MonotocityNot monotonic
2020-10-29T23:39:37.282638image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
35394442.5%
 
10099338392.1%
 
147330662.1%
 
140287511.8%
 
287251911.6%
 
132240831.5%
 
1199200041.3%
 
345194791.2%
 
220168371.1%
 
30161071.0%
 
Other values (5830)132981383.8%
 
ValueCountFrequency (%) 
113570.1%
 
240< 0.1%
 
353570.3%
 
473210.5%
 
5728< 0.1%
 
ValueCountFrequency (%) 
280032< 0.1%
 
280001< 0.1%
 
279841< 0.1%
 
279803< 0.1%
 
279451< 0.1%
 

brewery_name
Categorical

HIGH CARDINALITY

Distinct5742
Distinct (%)0.4%
Missing15
Missing (%)< 0.1%
Memory size12.1 MiB
Boston Beer Company (Samuel Adams)
 
39444
Dogfish Head Brewery
 
33839
Stone Brewing Co.
 
33066
Sierra Nevada Brewing Co.
 
28751
Bell's Brewery, Inc.
 
25191
Other values (5737)
1426308 
ValueCountFrequency (%) 
Boston Beer Company (Samuel Adams)394442.5%
 
Dogfish Head Brewery338392.1%
 
Stone Brewing Co.330662.1%
 
Sierra Nevada Brewing Co.287511.8%
 
Bell's Brewery, Inc.251911.6%
 
Rogue Ales240831.5%
 
Founders Brewing Company200041.3%
 
Victory Brewing Company194791.2%
 
Lagunitas Brewing Company168371.1%
 
Avery Brewing Company161071.0%
 
Other values (5732)132979883.8%
 
2020-10-29T23:39:37.493775image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique672 ?
Unique (%)< 0.1%
2020-10-29T23:39:37.694938image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length66
Median length23
Mean length23.61012761
Min length3

review_time
Real number (ℝ≥0)

Distinct1577960
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224089280
Minimum840672001
Maximum1326285348
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:38.769063image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum840672001
5-th percentile1071431292
Q11173224188
median1239202882
Q31288568405
95-th percentile1318389924
Maximum1326285348
Range485613347
Interquartile range (IQR)115344217

Descriptive statistics

Standard deviation76544274.54
Coefficient of variation (CV)0.06253161088
Kurtosis-0.3136982976
Mean1224089280
Median Absolute Deviation (MAD)54219357.5
Skewness-0.7352727768
Sum1.942157189e+15
Variance5.859025965e+15
MonotocityNot monotonic
2020-10-29T23:39:38.942781image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
110177280021< 0.1%
 
10311012008< 0.1%
 
9263808018< 0.1%
 
8970912017< 0.1%
 
10221120017< 0.1%
 
9808128017< 0.1%
 
9048672016< 0.1%
 
9330336016< 0.1%
 
9029664016< 0.1%
 
9262944015< 0.1%
 
Other values (1577950)1586533> 99.9%
 
ValueCountFrequency (%) 
8406720011< 0.1%
 
8843904011< 0.1%
 
8846496011< 0.1%
 
8853408011< 0.1%
 
8854272011< 0.1%
 
ValueCountFrequency (%) 
13262853481< 0.1%
 
13262849701< 0.1%
 
13262766561< 0.1%
 
13262750491< 0.1%
 
13262744541< 0.1%
 

review_overall
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.815580853
Minimum0
Maximum5
Zeros7
Zeros (%)< 0.1%
Memory size12.1 MiB
2020-10-29T23:39:39.083665image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.5
Q13.5
median4
Q34.5
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7206218681
Coefficient of variation (CV)0.1888629532
Kurtosis1.631038958
Mean3.815580853
Median Absolute Deviation (MAD)0.5
Skewness-1.023968713
Sum6053854
Variance0.5192958767
MonotocityNot monotonic
2020-10-29T23:39:39.213904image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
458276436.7%
 
4.532438520.4%
 
3.530181719.0%
 
316564410.4%
 
5913205.8%
 
2.5585233.7%
 
2382252.4%
 
1.5129750.8%
 
1109540.7%
 
07< 0.1%
 
ValueCountFrequency (%) 
07< 0.1%
 
1109540.7%
 
1.5129750.8%
 
2382252.4%
 
2.5585233.7%
 
ValueCountFrequency (%) 
5913205.8%
 
4.532438520.4%
 
458276436.7%
 
3.530181719.0%
 
316564410.4%
 

review_aroma
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.735636078
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:39.345411image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range4
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6976167288
Coefficient of variation (CV)0.1867464374
Kurtosis1.145196752
Mean3.735636078
Median Absolute Deviation (MAD)0.5
Skewness-0.838530526
Sum5927012.5
Variance0.4866691003
MonotocityNot monotonic
2020-10-29T23:39:39.470241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
455738335.1%
 
3.536531223.0%
 
4.527145017.1%
 
320003012.6%
 
2.5663594.2%
 
5641174.0%
 
2425662.7%
 
1.5125240.8%
 
168730.4%
 
ValueCountFrequency (%) 
168730.4%
 
1.5125240.8%
 
2425662.7%
 
2.5663594.2%
 
320003012.6%
 
ValueCountFrequency (%) 
5641174.0%
 
4.527145017.1%
 
455738335.1%
 
3.536531223.0%
 
320003012.6%
 

review_appearance
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.841641697
Minimum0
Maximum5
Zeros7
Zeros (%)< 0.1%
Memory size12.1 MiB
2020-10-29T23:39:39.598019image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range5
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6160927689
Coefficient of variation (CV)0.160372262
Kurtosis1.738866541
Mean3.841641697
Median Absolute Deviation (MAD)0.5
Skewness-0.9024199172
Sum6095202.5
Variance0.3795702999
MonotocityNot monotonic
2020-10-29T23:39:39.733206image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
467418642.5%
 
3.531852920.1%
 
4.528810818.2%
 
316600910.5%
 
5653984.1%
 
2.5394932.5%
 
2254141.6%
 
1.561470.4%
 
133230.2%
 
07< 0.1%
 
ValueCountFrequency (%) 
07< 0.1%
 
133230.2%
 
1.561470.4%
 
2254141.6%
 
2.5394932.5%
 
ValueCountFrequency (%) 
5653984.1%
 
4.528810818.2%
 
467418642.5%
 
3.531852920.1%
 
316600910.5%
 

review_profilename
Categorical

HIGH CARDINALITY

Distinct33387
Distinct (%)2.1%
Missing348
Missing (%)< 0.1%
Memory size12.1 MiB
northyorksammy
 
5817
BuckeyeNation
 
4661
mikesgroove
 
4617
Thorpe429
 
3518
womencantsail
 
3497
Other values (33382)
1564156 
ValueCountFrequency (%) 
northyorksammy58170.4%
 
BuckeyeNation46610.3%
 
mikesgroove46170.3%
 
Thorpe42935180.2%
 
womencantsail34970.2%
 
NeroFiddled34880.2%
 
ChainGangGuy34710.2%
 
brentk5633570.2%
 
Phyl21ca31790.2%
 
WesWes31680.2%
 
Other values (33377)154749397.5%
 
2020-10-29T23:39:40.027808image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique10443 ?
Unique (%)0.7%
2020-10-29T23:39:40.242105image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length16
Median length9
Mean length8.961438636
Min length3

beer_style
Categorical

HIGH CARDINALITY

Distinct104
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.1 MiB
American IPA
 
117586
American Double / Imperial IPA
 
85977
American Pale Ale (APA)
 
63469
Russian Imperial Stout
 
54129
American Double / Imperial Stout
 
50705
Other values (99)
1214748 
ValueCountFrequency (%) 
American IPA1175867.4%
 
American Double / Imperial IPA859775.4%
 
American Pale Ale (APA)634694.0%
 
Russian Imperial Stout541293.4%
 
American Double / Imperial Stout507053.2%
 
American Porter504773.2%
 
American Amber / Red Ale457512.9%
 
Belgian Strong Dark Ale377432.4%
 
Fruit / Vegetable Beer338612.1%
 
American Strong Ale319452.0%
 
Other values (94)101497164.0%
 
2020-10-29T23:39:40.457201image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-29T23:39:40.860901image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length35
Median length18
Mean length17.86997972
Min length4

review_palate
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.743701367
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:40.999178image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range4
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6822183634
Coefficient of variation (CV)0.1822309785
Kurtosis1.303397287
Mean3.743701367
Median Absolute Deviation (MAD)0.5
Skewness-0.8691499712
Sum5939809
Variance0.4654218953
MonotocityNot monotonic
2020-10-29T23:39:41.119317image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
460671138.2%
 
3.533858521.3%
 
4.525310216.0%
 
320693213.0%
 
2.5628424.0%
 
5621903.9%
 
2383332.4%
 
1.5110450.7%
 
168740.4%
 
ValueCountFrequency (%) 
168740.4%
 
1.5110450.7%
 
2383332.4%
 
2.5628424.0%
 
320693213.0%
 
ValueCountFrequency (%) 
5621903.9%
 
4.525310216.0%
 
460671138.2%
 
3.533858521.3%
 
320693213.0%
 

review_taste
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.792860456
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:41.249572image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34.5
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7319696099
Coefficient of variation (CV)0.1929861692
Kurtosis1.341669306
Mean3.792860456
Median Absolute Deviation (MAD)0.5
Skewness-0.9734324438
Sum6017805.5
Variance0.5357795098
MonotocityNot monotonic
2020-10-29T23:39:41.370114image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
454142934.1%
 
4.533616221.2%
 
3.532454120.5%
 
316686010.5%
 
5839775.3%
 
2.5665344.2%
 
2419922.6%
 
1.5151281.0%
 
199910.6%
 
ValueCountFrequency (%) 
199910.6%
 
1.5151281.0%
 
2419922.6%
 
2.5665344.2%
 
316686010.5%
 
ValueCountFrequency (%) 
5839775.3%
 
4.533616221.2%
 
454142934.1%
 
3.532454120.5%
 
316686010.5%
 

beer_name
Categorical

HIGH CARDINALITY

Distinct56857
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size12.1 MiB
90 Minute IPA
 
3290
India Pale Ale
 
3130
Old Rasputin Russian Imperial Stout
 
3111
Sierra Nevada Celebration Ale
 
3000
Two Hearted Ale
 
2728
Other values (56852)
1571355 
ValueCountFrequency (%) 
90 Minute IPA32900.2%
 
India Pale Ale31300.2%
 
Old Rasputin Russian Imperial Stout31110.2%
 
Sierra Nevada Celebration Ale30000.2%
 
Two Hearted Ale27280.2%
 
Arrogant Bastard Ale27040.2%
 
Stone Ruination IPA27040.2%
 
Sierra Nevada Pale Ale25870.2%
 
Stone IPA (India Pale Ale)25750.2%
 
Pliny The Elder25270.2%
 
Other values (56847)155825898.2%
 
2020-10-29T23:39:41.717302image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique18908 ?
Unique (%)1.2%
2020-10-29T23:39:41.919503image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length75
Median length19
Mean length20.45317513
Min length1

beer_abv
Real number (ℝ≥0)

MISSING

Distinct530
Distinct (%)< 0.1%
Missing67785
Missing (%)4.3%
Infinite0
Infinite (%)0.0%
Mean7.042386753
Minimum0.01
Maximum57.7
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:42.099019image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile4.5
Q15.2
median6.5
Q38.5
95-th percentile11
Maximum57.7
Range57.69
Interquartile range (IQR)3.3

Descriptive statistics

Standard deviation2.322525993
Coefficient of variation (CV)0.3297924516
Kurtosis6.961811545
Mean7.042386753
Median Absolute Deviation (MAD)1.5
Skewness1.543406148
Sum10696181.23
Variance5.394126987
MonotocityNot monotonic
2020-10-29T23:39:42.243203image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
51091446.9%
 
8677444.3%
 
6653834.1%
 
7594603.7%
 
9591833.7%
 
5.5590103.7%
 
10547803.5%
 
6.5483693.0%
 
5.2432682.7%
 
7.5399782.5%
 
Other values (520)91251057.5%
 
(Missing)677854.3%
 
ValueCountFrequency (%) 
0.015< 0.1%
 
0.0517< 0.1%
 
0.081< 0.1%
 
0.111< 0.1%
 
0.253< 0.1%
 
ValueCountFrequency (%) 
57.71< 0.1%
 
432< 0.1%
 
4176< 0.1%
 
39.443< 0.1%
 
397< 0.1%
 

beer_beerid
Real number (ℝ≥0)

Distinct66055
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21712.79428
Minimum3
Maximum77317
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2020-10-29T23:39:42.429601image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile213
Q11717
median13906
Q339441
95-th percentile62653
Maximum77317
Range77314
Interquartile range (IQR)37724

Descriptive statistics

Standard deviation21818.336
Coefficient of variation (CV)1.004860808
Kurtosis-0.8339342225
Mean21712.79428
Median Absolute Deviation (MAD)13217
Skewness0.6893969312
Sum3.444982338e+10
Variance476039785.7
MonotocityNot monotonic
2020-10-29T23:39:42.601418image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
209332900.2%
 
41231110.2%
 
190430000.2%
 
109327280.2%
 
9227040.2%
 
408327040.2%
 
27625870.2%
 
8825750.2%
 
797125270.2%
 
1175725020.2%
 
Other values (66045)155888698.3%
 
ValueCountFrequency (%) 
33< 0.1%
 
410< 0.1%
 
5424< 0.1%
 
68770.1%
 
7659< 0.1%
 
ValueCountFrequency (%) 
773171< 0.1%
 
773161< 0.1%
 
773151< 0.1%
 
773141< 0.1%
 
773131< 0.1%
 

Interactions

2020-10-29T23:38:56.031810image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:56.430498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:56.841409image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:57.215282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:57.615717image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:57.974237image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:58.357297image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:59.184362image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:59.570570image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:38:59.946912image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:00.316779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:00.735430image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:01.114464image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:01.532579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:01.904934image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:02.288790image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:02.680938image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:03.065668image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:03.464875image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:03.856278image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:04.238459image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:04.602910image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:04.996145image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:05.364566image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:05.741649image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:06.143409image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:06.525062image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:06.907720image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:07.313862image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:07.717493image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:08.099944image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:08.527991image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:08.909481image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:09.304345image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:09.696779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:10.096021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:10.609021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:10.981212image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:11.371462image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:11.727704image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:12.132571image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:12.492906image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:12.864923image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:13.259845image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:13.640498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:14.022749image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:14.417302image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:14.831000image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:15.211855image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:15.641393image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:16.027904image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:16.431051image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:16.849175image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:17.249110image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:17.655877image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:18.057338image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:18.474745image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:18.850715image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:19.281343image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:19.662940image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:20.075781image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:20.495202image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:20.894088image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:21.294313image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:21.683931image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:22.085513image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:22.463845image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:22.891273image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:23.281166image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:23.692103image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:24.096562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:24.621898image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:25.007188image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:25.408660image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:25.819006image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:26.230098image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:26.658576image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:27.050339image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:27.463361image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:27.857503image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:28.273685image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-10-29T23:39:42.764195image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-29T23:39:42.970165image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-29T23:39:43.179523image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-29T23:39:43.378430image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-10-29T23:39:30.197868image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:31.843202image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:34.898201image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-29T23:39:35.539784image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

brewery_idbrewery_namereview_timereview_overallreview_aromareview_appearancereview_profilenamebeer_stylereview_palatereview_tastebeer_namebeer_abvbeer_beerid
010325Vecchio Birraio12348178231.52.02.5stculesHefeweizen1.51.5Sausa Weizen5.047986
110325Vecchio Birraio12359150973.02.53.0stculesEnglish Strong Ale3.03.0Red Moon6.248213
210325Vecchio Birraio12359166043.02.53.0stculesForeign / Export Stout3.03.0Black Horse Black Beer6.548215
310325Vecchio Birraio12347251453.03.03.5stculesGerman Pilsener2.53.0Sausa Pils5.047969
41075Caldera Brewing Company12937352064.04.54.0johnmichaelsenAmerican Double / Imperial IPA4.04.5Cauldron DIPA7.764883
51075Caldera Brewing Company13255246593.03.53.5oline73Herbed / Spiced Beer3.03.5Caldera Ginger Beer4.752159
61075Caldera Brewing Company13189911153.53.53.5ReidroverHerbed / Spiced Beer4.04.0Caldera Ginger Beer4.752159
71075Caldera Brewing Company13062760183.02.53.5alpinebryantHerbed / Spiced Beer2.03.5Caldera Ginger Beer4.752159
81075Caldera Brewing Company12904545034.03.03.5LordAdmNelsonHerbed / Spiced Beer3.54.0Caldera Ginger Beer4.752159
91075Caldera Brewing Company12856329244.53.55.0augustgarageHerbed / Spiced Beer4.04.0Caldera Ginger Beer4.752159

Last rows

brewery_idbrewery_namereview_timereview_overallreview_aromareview_appearancereview_profilenamebeer_stylereview_palatereview_tastebeer_namebeer_abvbeer_beerid
158660414359The Defiant Brewing Company12888902064.04.54.5njmoonsPumpkin Ale3.53.5The Horseman's Ale5.233061
158660514359The Defiant Brewing Company11632911435.05.05.0NyackNickyPumpkin Ale5.05.0The Horseman's Ale5.233061
158660614359The Defiant Brewing Company11628718085.04.54.0blitheringidiotPumpkin Ale5.05.0The Horseman's Ale5.233061
158660714359The Defiant Brewing Company11628656405.05.04.5PopeDXPumpkin Ale5.04.5The Horseman's Ale5.233061
158660814359The Defiant Brewing Company11626858563.54.04.0treehugger02010Pumpkin Ale3.53.0The Horseman's Ale5.233061
158660914359The Defiant Brewing Company11626848925.04.03.5maddogrussPumpkin Ale4.04.0The Horseman's Ale5.233061
158661014359The Defiant Brewing Company11610485664.05.02.5yelterdowPumpkin Ale2.04.0The Horseman's Ale5.233061
158661114359The Defiant Brewing Company11607025134.53.53.0TongoRadPumpkin Ale3.54.0The Horseman's Ale5.233061
158661214359The Defiant Brewing Company11600230444.04.54.5dherlingPumpkin Ale4.54.5The Horseman's Ale5.233061
158661314359The Defiant Brewing Company11600053195.04.54.5cbl2Pumpkin Ale4.54.5The Horseman's Ale5.233061